Code
set.seed(123)
# Load libraries
library(tidyverse)
library(dagitty)
library(ggdag)
library(janitor)
library(MASS)
library(modelsummary)
library(kableExtra)
library(knitr)
library(scales)December 3, 2025
Can a country reduce emissions while growing its GDP? This question has gained significant attention in recent years as all countries globally are trying to achieve the economic development with maintaining environmental sustainability. In this assignment, we examine global power-sector emissions and GDP data from more than 200 countries to explore whether sustained economic growth can occur alongside meaningful emission reductions.
Economic growth and greenhouse gas emissions have traditionally been closely linked. The increase in GDP is often accompanied by higher emissions rate due to higher energy consumption, increasing industrial activities and infrastructure growth. However, the recent trend shows that some countries are beginning to break this pattern. By analyzing emissions alongside GDP, we can identify which countries are achieving ‘decoupling’ where economic growth increases without a corresponding rise in emissions. This decoupling can be a result of combination of multiple factors such as clean energy, improvements in energy efficiency, technological innovation and effective climate policies. The study of these trends provides valuable insights into how countries can pursue sustainable development, balancing economic progress with environmental responsibility. It can serve as a guide for other nations who are aiming to reduce their carbon footprint while maintaining their economic growth.
First, let’s load all the required libraries.
| Variables | Abbreviation |
|---|---|
| Economy | EC |
| Emissions Quantity | EQ |
| Energy Use | EU |
| Technology | T |
| Decoupling Year | DY |
| GDP Growth | GDP |
Here, we will work with the following data sets:
The data set comes from the World Bank and uses the indicator NY.GDP.MKTP.CD, which reports national gross domestic product (GDP) in current U.S. dollars. It measures the total value of all goods and services produced in a country in a given year, converted into U.S. dollars using that year’s official exchange rate. As it is a ‘current’ GDP, the values reflect the prices and exchange rates of the respective year, not adjusted for inflation or purchasing-power differences.
This data set is retrieved from Climate TRACE. It provides a detailed, open-access global inventory of greenhouse gas and air pollution emissions. It aggregates data from hundreds of millions of emission sources worldwide including power plants, industrial facilities, transportation, agriculture, and more, allowing emissions to be broken down by country, sector, sub-sector and time period. The data cover emissions over multiple years, starting from 2015 for annual country-level data, and with monthly and source level records available since 2021 .
In this analysis, we focus specifically on the power sector data, which provides detailed emissions estimates, allowing us to examine trends and impacts within one of the largest sources of global greenhouse-gas emissions.
Question: To what extent GDP and global emissions account for the number of years countries achieve by decoupling between economic growth and emissions?
Null Hypothesis (H0): There is no significant relationship between the GDP and global emissions, and the number of decoupling years.
Alternative Hypothesis (HA): There is a significant relationship between the GDP and global emissions, and the number of decoupling years.
In this project, we will examine whether it is possible for countries to achieve emission reductions while maintaining GDP growth. To evaluate this relationship, one variable must increase while the other decreases. Additionally, the global emissions and GDP data exhibit clear signs of over dispersion, making the Negative Binomial Model an appropriate choice for analysis.
Before fitting the actual data to the model, we will simulate data to conform the Negative Binomial Model. This will help us understand the model and ensure that it behaves as expected prior to applying it to real world data set.
The statistical notation for the Negative Binomial Model.
\[ \begin{align} \text{CountOutcome} &\sim NegativeBinomial(\mu, \theta) \\ log(\mu) &= \beta_0 + \beta_1 \text{Predictor} \end{align} \\ \]
Fit a logistic model like below to test the model.
model <- glm.nb(count_outcome ~ predictor, data = my_data)
# Set the parameters
n <- 10000 # Data set size
beta0 <- 0.6
beta1 <- 1.8
r <- 2 # Dispersion
# Uniform distribution between 1 and 10
x <- runif(n, min = 1, max = 10)
# Calculate mean (mu) using the Negative Binomial link function (log link)
mu <- exp(beta0 + beta1 * x)
# Simulate the dependent variable (y)
y <- rnbinom(n = n, mu = mu, size = r)
# Create a data set
my_data <- data.frame(x, y)
# Fit a negative binomial model
negbinomial <- glm.nb(y ~ x, data = my_data)# Create predictions from the model
my_data$prediction <- predict(negbinomial, type = "response")
# Simple scatter plot with fitted line
ggplot(my_data, aes(x = x, y = y)) +
geom_point(alpha = 0.8, size = 0.2, color = "hotpink") +
geom_line(aes(y = prediction), color = "darkgreen", linewidth = 1) +
labs(title = "Negative Binomial Model",
x = "X axis",
y = "Y axis") +
theme_minimal()| Negative Binomial | |
|---|---|
| (Intercept) | 0.589 |
| (0.017) | |
| x | 1.802 |
| (0.003) | |
| Num.Obs. | 10000 |
| AIC | 226769.7 |
| BIC | 226791.3 |
| Log.Lik. | -113381.836 |
| RMSE | 13952620.72 |
Here, let’s check the dispersion parameter that controls the amount of over dispersion in the data.
[1] "Estimated Theta: 2.0202"
The coefficients from our simulation are almost same as the ones we used initially. This tells us that our model is set up correctly and uses the right negative binomial settings.
| Coefficients | Set up | Simulated |
|---|---|---|
| beta0 | 0.6 | 0.58 |
| beta1 | 1.8 | 1.80 |
| theta | 2 | 2.00 |
Now, we can run the negative binomial regression on our real data.
# Read the data for power sector
electricity <- read_csv(here::here('posts', '2025-12-global-emissions-gdp', 'data', 'power', 'DATA', 'electricity-generation_country_emissions_v5_1_0.csv'),
na = c(" ", "0", "0.0", "NA")) %>%
clean_names()
heat_plants <- read_csv(here::here('posts', '2025-12-global-emissions-gdp', 'data', 'power', 'DATA', 'heat-plants_country_emissions_v5_1_0.csv'),
na = c(" ", "0", "0.0", "NA")) %>%
clean_names()
other_energy <- read_csv(here::here('posts', '2025-12-global-emissions-gdp', 'data', 'power', 'DATA', 'other-energy-use_country_emissions_v5_1_0.csv'),
na = c(" ", "0", "0.0", "NA")) %>%
clean_names()
# Read the GDP data set
gdp <- read_csv(here::here('posts', '2025-12-global-emissions-gdp', 'data', 'API_NY.GDP.MKTP.CD_DS2_en_csv_v2_280632', 'API_NY.GDP.MKTP.CD_DS2_en_csv_v2_280632.csv'),
na = c(" ", "NA"),
skip = 4) %>%
clean_names()Combine and clean power sub sector data for yearly analysis. - We combine data from electricity, heat plants, and other energy sub-sectors to create a clean, year-wise dataset. This provides clear insights into trends and patterns in energy production, enabling meaningful comparisons across countries over time.
# Combine all three sub sector data sets
power <- bind_rows(electricity, heat_plants, other_energy)
# Make a new column 'year' to segregate according to year
power_year <- power %>%
mutate(year = year(power$start_time))
# Rename the column and filter year 2025
power_year_cleaned <- power_year %>%
# Rename the column to match with gdp dataset
rename(country_code = iso3_country) %>%
# Filter out the year 2025 (no data of 2025 available for GDP)
filter(year != 2025)Clean and reshape GDP data for further analysis. - We clean and reshape the GDP dataset into a long format to for annual analysis. This restructure allows easy integration with other datasets and helps accurate, time-based comparisons.
# Convert data to long format
gdp_long<- gdp %>%
# Drop 'x70' column and columns from x1960 to x2014
dplyr::select(-x70, -x1960:-x2014) %>%
pivot_longer(cols = starts_with("x"),
names_to = "year",
values_to = "gdp_values") %>%
# Remove 'x' from the 'year' string
mutate(year = stringr::str_replace(year, pattern = "x", replacement = ""),
# Convert the resulting string to a numeric data type
year = as.numeric(year))Combine power sub sector and GDP data for a streamlined structure and easy analysis. - By combining power sub-sector and GDP data, we create a dataset that has comprehensive details of energy-related emissions in relation to GDP across countries and over time.
# Join the data frame and reorder the columns
power_gdp <- power_year_cleaned %>%
left_join(gdp_long, by = c("country_code", "year")) %>%
# Filter rows where 'country_name' column is empty
filter(!is.na(country_name), country_name != "") %>%
# Reorder the columns by position
dplyr::select(13, 1, 12, everything())Here, we identify the ten countries with the highest emission levels globally over the past decade (2015–2024). This will make it easier to observe trends and interpret the decoupling year when we calculate it later.
# Calculate the mean emissions of countries for each year (2015 - 2024)
power_gdp_mean <- power_gdp %>%
drop_na(emissions_quantity) %>%
group_by(country_name, year) %>%
summarize(mean = mean(emissions_quantity),
.groups = 'drop') %>%
arrange(desc(mean))
# Top 10 countries by overall mean emissions for all years
top_10_emitter <- power_gdp_mean %>%
group_by(country_name) %>%
summarize(overall_mean = mean(mean), .groups = 'drop') %>%
arrange(desc(overall_mean)) %>%
slice_head(n = 10)
# Filter to keep only top 10 countries
power_gdp_mean_top10 <- power_gdp_mean %>%
filter(country_name %in% top_10_emitter$country_name)# Visualize top ten emitting countries
ggplot(power_gdp_mean_top10, aes(x = year, y = mean, color = country_name)) +
geom_line() +
scale_x_continuous(breaks = seq(2015, 2024, by = 1)) +
scale_y_continuous(labels = scales::label_number(scale = 1e-6, suffix = "M t")) +
labs(x = "Year",
y = "Carbon emission per metric tonnes",
title = "Top ten emitting countries from 2015 to 2024",
color = "Country") +
theme_classic()The figure 3 shows that the highest‐emitting countries are China, United States, India, Russia, Japan, Saudi Arabia, Indonesia, South Africa, Korea and Iran.
We again identify the ten countries that have experienced increasing GDP over the past decade, from 2015 to 2024. This helps us highlight consistent economic growth trends and examine their potential impacts on energy consumption.
# Calculate the mean emissions of countries for each year (2015 - 2024)
power_gdp_mean <- power_gdp %>%
drop_na(gdp_values) %>%
group_by(country_name, year) %>%
summarize(mean = mean(gdp_values),
.groups = 'drop') %>%
arrange(desc(mean))
# Top 10 countries by overall mean emissions for all years
top_10_gdp <- power_gdp_mean %>%
group_by(country_name) %>%
summarize(overall_mean = mean(mean), .groups = 'drop') %>%
arrange(desc(overall_mean)) %>%
slice_head(n = 10)
# Filter to keep only top 10 countries
power_gdp_mean_gdp <- power_gdp_mean %>%
filter(country_name %in% top_10_gdp$country_name)# Visualize top ten countries with increasing GDP
suppressWarnings({
ggplot(power_gdp_mean_gdp, aes(x = year, y = mean, color = country_name)) +
geom_line(linewidth = 1) + # use the new correct argument
geom_point(size = 2) + # points still use size
scale_x_continuous(breaks = seq(2015, 2024, by = 1)) +
scale_y_continuous(labels = scales::label_number(scale = 1e-9, suffix = "B")) +
labs(
title = "Mean GDP of Top 10 Countries (2015–2024)",
x = "Year",
y = "Mean GDP",
color = "Country"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
axis.title = element_text(size = 12),
legend.title = element_text(size = 12)
)
})The figure 4 indicates that the countries with the highest GDP are United States, China, Japan, Germany, the United Kingdom, India, France, Italy, Brazil and Canada.
We calculate the total GDP and carbon emissions for each country by year and computed the corresponding annual percentage changes. This provides insights into how economic growth and emissions have changed over time for each country.
# Aggregate GDP and emissions by country and year
country_totals <- power_gdp %>%
group_by(country_name, country_code, year) %>%
summarize(total_emissions = sum(emissions_quantity, na.rm = TRUE),
gdp_values = first(gdp_values),
.groups = 'drop')
# Calculate annual changes in GDP and emissions
annual_changes <- country_totals %>%
arrange(country_code, year) %>%
group_by(country_name, country_code) %>%
mutate(emissions_pct_change = (total_emissions - lag(total_emissions)) / lag(total_emissions) * 100,
gdp_pct_change = (gdp_values - lag(gdp_values)) / lag(gdp_values) * 100) %>%
ungroup()This analysis gives countries where GDP increased while carbon emissions decreased. It highlights instances of environmental-economic decoupling.This provides a clear view of successful decoupling cases.
# Find the cases where emissions decreased while GDP increased
decoupling <- annual_changes %>%
filter(emissions_pct_change < 0 & gdp_pct_change > 0) %>%
dplyr::select(country_name, year, emissions_pct_change, gdp_pct_change)
# Find countries where emissions decreased while GDP increased
decoupling_case <- decoupling %>%
group_by(country_name) %>%
summarize(decoupling_years = n(),
avg_emission_reduction = mean(emissions_pct_change),
avg_gdp_growth = mean(gdp_pct_change),
.groups = 'drop') %>%
arrange(desc(decoupling_years))# Filter the top ten countries with decoupling effects
decoup_ten <- decoupling_case %>%
arrange(desc(decoupling_years)) %>%
slice_head(n = 10)
# Plot a graph for top ten countries
ggplot(decoup_ten, aes(x = reorder(country_name, decoupling_years),
y = decoupling_years)) +
geom_col(fill = 'coral2') +
geom_text(aes(label = decoupling_years), hjust = -0.3, size = 3.5) +
coord_flip() +
theme_classic() +
labs(x = "Top ten countries",
y = "Number of years with decoupling",
title = "Top 10 countries: Years of emission reduction\n while increasing GDP") +
ylim(0, max(decoup_ten$decoupling_years) * 1.1)The figure 5 highlights the countries that achieved the decoupling case: Ireland, Romania, Bulgaria, Luxembourg, the Netherlands, Poland, Portugal, Ukraine, United States and Australia.
These figures show an interesting relationship between economic growth and global emissions . Countries with the highest levels of greenhouse gas emissions are typically those with substantial economic resources and intensive industrial activity. Likewise, the nations with the largest GDPs are predominantly highly developed and economically influential. China, United States and India are on both lists, reflecting their positions as large, populous economies with significant global production footprints. However, when we look at the decoupling case, defined as increasing GDP accompanied by declining emissions, only United States was able to achieve decoupling over the past decade, from 2015 to 2024. This explains that it may be challenging for the countries to balance economic growth with meaningful emissions reductions.
We fit GDP and emissions data into a negative binomial model to quantify the relationship between economic growth and carbon emissions. This allows us to understand whether changes in GDP are associated with changes in emissions and to make predictions or draw inferences based on the model.
\[ \begin{align} \text{decoupling_years} &\sim NegativeBinomial(\mu, \theta) \\ log(\mu) &= \beta_0 + \beta_1 \text{avg_emission_reduction} + \beta_2 \text{avg_gdp_growth} \end{align} \\ \]
Now, let’s fit our decoupling years in the negative binomial model.
We extract the model coefficients and display them in a table to facilitate interpretation and comparison.
# Extract coefficient summary
coef_table <- coef(summary(nb_model))
# Create a clean data frame
results_table <- data.frame(
Predictor = c("Intercept", "Average GDP Growth", "Average Emission Reduction"),
Estimate = coef_table[, "Estimate"],
SE = coef_table[, "Std. Error"],
`z value` = coef_table[, "z value"],
`p value` = coef_table[, "Pr(>|z|)"])
# Create pretty table
kable(results_table,
digits = 4,
col.names = c("Predictor", "Estimate (β)", "Std. Error", "z-value", "p-value"),
caption = "Coefficients of Negative Binomial Regression",
align = c("l", "r", "r", "r", "r")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"),
full_width = FALSE,
position = "center") %>%
row_spec(0, bold = TRUE, background = "#f0f0f0") | Predictor | Estimate (β) | Std. Error | z-value | p-value | |
|---|---|---|---|---|---|
| (Intercept) | Intercept | 1.0423 | 0.0895 | 11.6431 | 0.0000 |
| avg_gdp_growth | Average GDP Growth | -0.0141 | 0.0075 | -1.8903 | 0.0587 |
| avg_emission_reduction | Average Emission Reduction | -0.0090 | 0.0046 | -1.9407 | 0.0523 |
We extract the estimated theta to evaluate the degree of dispersion and the precision of the estimate.
Note: In the negative binomial model, the dispersion parameter theta captures the degree of over dispersion in the data, how much the variance exceeds the mean. A higher theta indicates lower overdispersion, whereas a lower theta suggests stronger overdispersion.
The estimated dispersion parameter theta is extremely large at ~ 59,399.34, which indicates that the data has almost no overdispersion. A theta of this magnitude suggests that the variance of the outcome is very close to the mean.
Here, we will interpret the coefficients of our model. The coefficients of the negative binomial regression show that both average GDP growth and average emission reduction have weak relationships with the decoupling year.
The intercept coefficient of 1.0423 is statistically significant at p < 0.001 and represents the expected log count of decoupling year when both GDP growth and emission reduction are at zero. This means that when both GDP growth and emission reduction are zero, the expected number of decoupling years is ~ 2.84.
The coefficient for average GDP growth is -0.0141, and its p-value (0.0587) shows that this relationship is not strong enough to be considered statistically significant at the 5% level. This negative coefficient suggests that for every one unit increase in average GDP growth, the expected log count of decoupling years decreases by 0.0141, when emission reduction is held constant.
The coefficient for average emission reduction is -0.0090, which is marginally significant at p = 0.0523. This indicates that for every one unit increase in emission reduction, the expected log count of decoupling year decreases by 0.0090.
This provides evidence for the challenges of achieving decoupling discussed above in the data analysis section. The negative relationship between GDP growth and decoupling years indicates that countries with rapid economic growth may face greater difficulty in sustaining decoupling. This could mean that faster growth increases energy demand and industrial activity that overtakes emission reduction efforts. It could also suggest the presence of confounding variable that need further investigation. Given our observation that only United States achieved true decoupling between 2015 and 2024, these marginal effects align with the overall narrative that balancing economic growth with meaningful emissions reductions remains exceptionally challenging for most countries.
We create a prediction grid for GDP growth and average emission reduction, then use the fitted negative binomial model to generate prediction and their standard errors. Using these, we construct 95% confidence intervals, providing a range of predicted values for each cases.
# Create a prediction grid
prediction_grid <- expand_grid(
avg_gdp_growth = seq(min(decoupling_case$avg_gdp_growth),
max(decoupling_case$avg_gdp_growth),
length.out = 50),
avg_emission_reduction = mean(decoupling_case$avg_emission_reduction))
# Generate predictions and standard errors on link scale
prediction_se <- predict(nb_model,
newdata = prediction_grid,
type = "link",
se.fit = TRUE)
# Construct confidence intervals
prediction_ci <- prediction_grid %>%
mutate(log_mu = prediction_se$fit,
log_mu_se = prediction_se$se.fit,
# Calculate 95% CI on link scale using ±1.96 SE
log_mu_lwr = log_mu - 1.96 * log_mu_se,
log_mu_upr = log_mu + 1.96 * log_mu_se,
# Transform from link scale (log) to response scale (counts)
mu = exp(log_mu),
mu_lwr = exp(log_mu_lwr),
mu_upr = exp(log_mu_upr))We should note that the prediction grid creates a set of hypothetical scenarios to make predictions from the model. For GDP growth, we use the minimum and maximum values to cover the full observed range to see how predictions change from low to high GDP growth. For emission reduction, we use the average value to keep it constant to focus on how changes in GDP growth affect the predicted outcome.
# Plot with single ribbon
ggplot() +
# Add the 95% CI ribbon
geom_ribbon(data = prediction_ci,
aes(x = avg_gdp_growth, ymin = mu_lwr, ymax = mu_upr),
fill = 'skyblue', alpha = 0.2) +
# Plot the prediction line
geom_line(data = prediction_ci,
aes(x = avg_gdp_growth, y = mu),
color = 'hotpink', linewidth = 1) +
# Plot the observed decoupling cases
geom_point(data = decoupling_case,
aes(x = avg_gdp_growth, y = decoupling_years),
color = 'darkgreen', alpha = 0.8, size = 0.4) +
labs(title = 'Cases of decoupling: Rising GDP and declining emissions',
subtitle = 'Shaded area shows 95% CI at median emission reduction',
x = 'Average GDP growth (%)',
y = 'Expected number of decoupling years') +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5, face = 'bold'),
plot.subtitle = element_text(hjust = 0.5))The width of the confidence intervals generally indicates how certain the model is about its predictions. A narrower confidence interval means more confidence. A wider confidence interval means greater uncertainty. Since, our confidence interval is wider here, the model is less certain about predictions for very low or very high GDP growth rates because there are very fewer observations in those ranges.
We conclude that we fail to reject the null hypothesis.
This study shows how economic growth and environmental sustainability are related. It highlights the global challenge of effective climate action. Although countries like China, United States and India rank high in both emissions and GDP because of their large economies and intensive industrial activity, only United States was able to achieve decoupling between 2015 and 2024. This means that most countries struggled to grow their economies while also reducing emissions. The results suggest that when economies expand, they increase energy use, technology use and industrial activity, the confounding variables that may indirectly influence emissions. These activities often grow faster than emission reduction efforts. Among the top ten countries that showed decoupling, only one achieved it, showing how rare environmental-economic decoupling still is. Overall, the findings suggest that achieving decoupling requires stronger policies, new technologies and long-term economic changes that allow countries to grow without harming the environment.
@online{poudel2025,
author = {Poudel, Aakriti},
title = {Global {Emissions} and {GDP}},
date = {2025-12-03},
url = {https://aakriti-poudel-chhetri.github.io/posts/2025-12-global-emissions-gdp/},
langid = {en}
}